First will look at salaries over time.

Note that salary information is only available from 1985 forward.

min(salaries$yearID)
## [1] 1985

We can observe salary trends over time using player salary data from the Lahman dataset, and US household salary data from the US Census Bureau.

Using the respective 1985 salaries as a baseline, we see that US median household income has increased by around 25% every 5 years. Growth in median player salary has generally outpaced the salary growth of US households until the past decade. Average salary, in contrast, has increased at a greater rate since 1990. This suggests that salary trends for “top” players may be different from that of “average” players.

Observing the trends by percentile, we can see that, indeed, salaries have not increased at the same rate across the board, with large gains for the top paid players, with little change for players in the lower percentiles. We will look at a comparison of percentiles by team.

On a team-by-team basis, we can see that salary growth was not the same between the median and the 90th salary percentiles. In particular, there was a decreasing trend in median salary between 1991 and 1995, where the 90th percentile did not change. What may have caused this?

Using the same data to plot the [greater percentage the 90th percentile was receiving], we can see an increasing trend until 1995, the year after the baseball strike of 1994, then a decreasing trend afterwards. Again, percentiles were calculated by team.

Is this a real trend? Let’s check with a piecewise linear spline regression.

## 
## Call:
## lm(formula = p90rat ~ yearID + yearID * (yearID > 1994.5), data = salteam)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.4831 -2.6670 -0.7435  1.4938 21.7437 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                -1.404e+03  1.629e+02  -8.620   <2e-16 ***
## yearID                      7.081e-01  8.185e-02   8.652   <2e-16 ***
## yearID > 1994.5TRUE         1.528e+03  1.697e+02   9.005   <2e-16 ***
## yearID:yearID > 1994.5TRUE -7.664e-01  8.526e-02  -8.989   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.847 on 914 degrees of freedom
## Multiple R-squared:  0.1634, Adjusted R-squared:  0.1607 
## F-statistic:  59.5 on 3 and 914 DF,  p-value: < 2.2e-16
## 
##  ***Regression Model with Segmented Relationship(s)***
## 
## Call: 
## segmented.lm(obj = splfit2, seg.Z = ~yearID, psi = 1994)
## 
## Estimated Break-Point(s):
##      Est.   St.Err 
## 1994.396    0.723 
## 
## Meaningful coefficients of the linear terms:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.404e+03  1.629e+02  -8.620   <2e-16 ***
## yearID       7.081e-01  8.185e-02   8.652   <2e-16 ***
## U1.yearID   -7.664e-01  8.526e-02  -8.989       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.847 on 914 degrees of freedom
## Multiple R-Squared: 0.1634,  Adjusted R-squared: 0.1607 
## 
## Convergence attained in 2 iterations with relative change 0

Looks legit. Modeling a discontinuous regression also illustrates this break.

## 
## Call:
## RDestimate(formula = p90rat ~ yearID, data = salteam, cutpoint = 1994.5)
## 
## Type:
## sharp 
## 
## Estimates:
##            Bandwidth  Observations  Estimate  Std. Error  z value
## LATE        6.049     334           2.872     1.0894      2.636  
## Half-BW     3.024     166           4.335     1.6445      2.636  
## Double-BW  12.097     618           1.708     0.7535      2.266  
##            Pr(>|z|)    
## LATE       0.008394  **
## Half-BW    0.008395  **
## Double-BW  0.023436  * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## F-statistics:
##            F      Num. DoF  Denom. DoF  p        
## LATE       34.93  3         330         0.000e+00
## Half-BW    11.26  3         162         1.901e-06
## Double-BW  74.99  3         614         0.000e+00
#plot rdd fit
plot(rddfit)

In 1994, two-pronged plan to limit the effect of reduced revenues. First was to enact revenue-sharing in order to keep smaller teams from having to drop out. The other was a salary cap, to limit the labor cost of players. Though the salary thing maybe didn’t go through, revenue sharing did. Was it effective?

Let’s look at how team/franchise revenue has changed over time. Data comes from Michael Ozanian, a writer for Forbes, who has been compiling annual financial data for Major League Baseball since 1990. Numbers have been adjusted to 1990 dollars using the Consumer Price Index (CPI).

We see that there is a drop in revenues in the year of the stike, but this recovers after a few years. Note that the 2012 revenue data is incomplete and skewed upwards. What about franchise values?

We can see that the strike did not drastically affect franchise values. Soon after the strike, as revenues recovered, valuations rose greatly after the revenue sharing agreement, and again after 2010 (note that the 2012 valuation data is complete). Let’s look back at revenues.

We can see that teams mostly have increasing revenues year-on-year, with the biggest increases in 1997 and 2014. That’s good, but the big point of the revenue sharing agreement was to prevent losses. Was this accomplished?

regress/anova by cba?

## 
## Call:
## lm(formula = teams ~ Year, data = oloss)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.28988 -0.10506  0.01310  0.05647  0.29085 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 29.710471   7.714400   3.851 0.000767 ***
## Year        -0.014688   0.003852  -3.813 0.000845 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1473 on 24 degrees of freedom
## Multiple R-squared:  0.3772, Adjusted R-squared:  0.3513 
## F-statistic: 14.54 on 1 and 24 DF,  p-value: 0.0008448
## 
## Call:
## lm(formula = teams ~ Year, data = deval)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.279949 -0.084264 -0.006249  0.103421  0.230952 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 32.617231   8.244450   3.956 0.000627 ***
## Year        -0.016183   0.004116  -3.932 0.000666 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1484 on 23 degrees of freedom
## Multiple R-squared:  0.402,  Adjusted R-squared:  0.376 
## F-statistic: 15.46 on 1 and 23 DF,  p-value: 0.0006664

Can see that there is a decreasing trend, with the percentage of teams decreasing by ~1.5% … teams losing value or with operating loss with every year, for a roughly 30% decrease over 20 years. (percentage points?)

##                Df Sum Sq Mean Sq F value Pr(>F)   
## as.factor(cba)  4 0.3586 0.08966   6.016 0.0024 **
## Residuals      20 0.2981 0.01490                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 1 observation deleted due to missingness
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = teams ~ as.factor(cba), data = oloss)
## 
## $`as.factor(cba)`
##              diff        lwr         upr     p adj
## 8-7    0.04963370 -0.1740707  0.27333810 0.9618847
## 9-7   -0.09322344 -0.3515350  0.16508815 0.8145533
## 10-7  -0.27655678 -0.5348684 -0.01824518 0.0322181
## 11-7  -0.17322344 -0.4182793  0.07183245 0.2523730
## 9-8   -0.14285714 -0.3665615  0.08084726 0.3437435
## 10-8  -0.32619048 -0.5498949 -0.10248607 0.0024846
## 11-8  -0.22285714 -0.4311146 -0.01459968 0.0323258
## 10-9  -0.18333333 -0.4416449  0.07497826 0.2489516
## 11-9  -0.08000000 -0.3250559  0.16505589 0.8624128
## 11-10  0.10333333 -0.1417226  0.34838923 0.7164209
##                Df Sum Sq Mean Sq F value  Pr(>F)   
## as.factor(cba)  4 0.4607 0.11519    6.28 0.00214 **
## Residuals      19 0.3485 0.01834                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 1 observation deleted due to missingness
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = teams ~ as.factor(cba), data = deval)
## 
## $`as.factor(cba)`
##              diff        lwr         upr     p adj
## 8-7   -0.31607143 -0.5917918 -0.04035104 0.0200669
## 9-7   -0.40714286 -0.7181974 -0.09608831 0.0069843
## 10-7  -0.34047619 -0.6515307 -0.02942164 0.0278618
## 11-7  -0.47714286 -0.7745679 -0.17971782 0.0009916
## 9-8   -0.09107143 -0.3404699  0.15832705 0.8053502
## 10-8  -0.02440476 -0.2738032  0.22499371 0.9982114
## 11-8  -0.16107143 -0.3932488  0.07110592 0.2660313
## 10-9   0.06666667 -0.2213139  0.35464722 0.9548525
## 11-9  -0.07000000 -0.3432023  0.20320234 0.9360068
## 11-10 -0.13666667 -0.4098690  0.13653568 0.5721522

We can see that the measures taken in CBAs since the strike have reduced and kept down the number of teams losing value.

Measures seem to prevent teams from losing value year-on-year.

https://www.sbnation.com/2010/8/30/1065675/8-30-2002-baseball-avoids-another 2002 - new collective bargaining agreement that increasing rev sharing 70% and added luxury tax

https://www.fangraphs.com/tht/a-history-of-the-collective-bargaining-agreement-part-3/ ^good

So it seems like the measures accomplish the goal of keeping teams on good financial footing. Let’s see how they affect team composition.

totsals <- salteam %>% group_by(yearID) %>% mutate(mutot = mean(totsal),sdtot = sd(totsal))
totsals <- totsals %>% mutate(stdtot = (totsal-mutot)/sdtot)
totsals %>% ggplot() + geom_point(aes(yearID,stdtot,group=yearID),size=1) +
  geom_smooth(aes(yearID,stdtot))
## `geom_smooth()` using method = 'loess'

revsals <- revs %>% rename(yearID=Year)
revsals <- left_join(revsals,names,by=c("yearID","franchID"))
revsals <- left_join(revsals,totsals,by=c("yearID","teamID"))
revsals <- revsals %>% mutate(pctrev = totsal/(Revenue*1000000))
revsals %>% ggplot() + geom_boxplot(aes(yearID,pctrev,fill=factor(yearID))) +
  geom_vline(aes(xintercept=as.numeric(1994)),linetype=1,size=7,alpha=0.5, color="red") +
  geom_text(aes(x=1994,y=0.0),label="STRIKE",angle=90,hjust = 0,color="white") +
  labs(title="Percent of revenue spent on player payroll",x="Year",y="Percentage of revenue ($, millions)") +
  scale_x_continuous(breaks = seq(1990,2015,5)) +theme(legend.position="none")
## Warning: Removed 21 rows containing non-finite values (stat_boxplot).

WAR is a stat and blah blah blah

add vline, change palette

warteam %>% ggplot() + geom_tile(aes(x=yearID,y=factor(franfct),fill=medWAR)) + 
  scale_x_continuous(breaks = seq(1985,2015,5)) + 
  #scale_y_discrete(labels = rev(allwar$franchID)) +
  scale_fill_gradient(low="#00007F", high="red", name="WAR") +
  geom_vline(aes(xintercept=1994),linetype=3,size=1, color="white") +
  labs(title="Median WAR by team",x="Year",y="Team/Franchise")

The teams with the highest WAR prior to 2000

warteam %>% ggplot() + geom_tile(aes(x=yearID,y=factor(franfct),fill=wpct)) + 
  scale_x_continuous(breaks = seq(1985,2015,5)) + 
  #scale_y_discrete(labels = rev(allwar$franchID)) +
  scale_fill_gradient(low="#00007F", high="red", name="Win\nPercentage") +
  geom_vline(aes(xintercept=1994),linetype=3,size=1, color="white") +
  labs(title="Win percentage by team",x="Year",y="Team/Franchise")

WL ratio and rank were not affected, random as always.

What kind of players were paid more?

[1995]

for (i in 1:nrow(revs)) {
  revs$TID[[i]] = names$teamID[which(names$yearID==revs$Year[[i]] & names$franchID==revs$franchID[[i]])]
}
for (i in 1:nrow(salmax95)) {
  salmax95$teamname[[i]] <- revs$Team[which(revs$Year==1995 & revs$TID==salmax95$teamID[[i]])]  
}
salmax95$teamname <- as.character(salmax95$teamname)
salmax95 <- salmax95 %>% mutate(salchr = paste("$",round(salary/100000)/10,sep=""))
salmax95 %>% select(fullname,teamname,salchr) %>% rename(Name=fullname,Team=teamname,Salary=salchr) %>% arrange(desc(Salary))
##                 Name      Team Salary
## 1      Cecil Fielder    Tigers   $9.2
## 2        Barry Bonds    Giants   $8.2
## 3         David Cone Blue Jays     $8
## 4        Ken Griffey  Mariners   $7.6
## 5       Frank Thomas White Sox   $7.2
## 6       Jeff Bagwell    Astros   $6.9
## 7       Mark McGwire Athletics   $6.9
## 8         Cal Ripken   Orioles   $6.7
## 9        Greg Maddux    Braves   $6.5
## 10     Kirby Puckett     Twins   $6.3
## 11     Lenny Dykstra  Phillies   $6.2
## 12      Barry Larkin      Reds   $5.9
## 13      Jose Canseco   Red Sox   $5.8
## 14   Bret Saberhagen      Mets   $5.6
## 15    Gary Sheffield   Marlins   $5.6
## 16        Will Clark   Rangers   $5.6
## 17     Jack McDowell   Yankees   $5.4
## 18 Darryl Strawberry   Dodgers   $5.3
## 19     Mark Langston    Angels     $5
## 20      Larry Walker   Rockies     $5
## 21       Greg Vaughn   Brewers   $4.9
## 22      Wally Joyner    Royals   $4.8
## 23        Tony Gwynn    Padres   $4.7
## 24   Dennis Martinez   Indians   $4.6
## 25          Ken Hill Cardinals   $4.5
## 26          Jay Bell   Pirates   $4.4
## 27        Mark Grace      Cubs   $4.4
## 28       Moises Alou     Expos     $3

Max salaries in a team went to future Hall of Famers Ken Griffey Jr., Barry Larkin, Kirby Puckett, and Cal Ripken, Jr., as well as other popular players of the time, including Barry Bonds, Jose Canseco, Mark McGwire, and Daryl Strawberry. So, it seems that salaries went to popular players, who were presumably also the players who played well.

Do better players make more?

#Previously calculated salary changes and performance changes year-to-year and saved in warplus.csv
warstats <- wars %>% filter(pitcher=="N")
#warstats <- wars
warstats <- warstats[c("playerID","yearID","teamID",
                       "WAA","WAA_off","WAA_def",
                       "WAR","WAR_off","WAR_def",
                       "prevWAA","prevWAA_off","prevWAA_def",
                       "prevWAR","prevWAR_off","prevWAR_def",
                       "dWAA","dWAA_off","dWAA_def",
                       "dWAR","dWAR_off","dWAR_def",
                       "OPS_plus","salary","bump","cut","dsal","pdsal")]
warstats <- warstats %>% filter(yearID > 1985)

warstats <- warstats %>% filter(dsal!=0)
warstats <- warstats %>% filter(abs(dsal)>1)

warcut <- warstats %>% filter(dsal<0)
warcut %>% ggplot() + geom_histogram(aes(as.numeric(prevWAA_off)),binwidth=0.05)

warbump <- warstats %>% filter(dsal>0)
warbump %>% ggplot() + geom_histogram(aes(as.numeric(prevWAA_off)),binwidth=0.05)

Doesn’t say much. Let’s look at other trends.

warstats$pdsal[which(warstats$pdsal==Inf)] = NA

warstats %>% filter(pdsal < 40, dsal > -100000) %>% ggplot() + 
  geom_point(aes(prevWAA_off,dsal + abs(min(dsal))+1 )) +
  geom_smooth(aes(prevWAA_off,dsal + abs(min(dsal))+1 )) +
  scale_y_continuous(trans="log10")
## `geom_smooth()` using method = 'gam'

warstats %>% ggplot() + 
  geom_point(aes(prevWAA_off,dsal/1000000)) +
  geom_smooth(aes(prevWAA_off,dsal/1000000)) +
  scale_y_continuous(breaks=seq(-30,30,5)) +
  labs(x="WAA in previous year", y="Salary change ($, millions)")
## `geom_smooth()` using method = 'gam'

warstats %>% ggplot() + 
  geom_point(aes(prevWAR_off,dsal/1000000)) +
  geom_smooth(aes(prevWAR_off,dsal/1000000)) +
  scale_y_continuous(breaks=seq(-30,30,5)) +
  labs(x="WAR in previous year", y="Salary change ($, millions)")
## `geom_smooth()` using method = 'gam'

# warbump %>% filter(pdsal < 10000) %>% ggplot() + 
#   scale_y_continuous(trans="log10") +
#   geom_point(aes(as.numeric(prevWAA_off),pdsal))
# warcut %>% filter(pdsal < 10000) %>% ggplot() + 
#   geom_point(aes(as.numeric(prevWAA),pdsal))

Again, doesn’t look like much. Let’s look at the regression.

What about for a single year?

warstats %>% filter(yearID==1991) %>% ggplot() + 
  geom_point(aes(prevWAR,salary/1000000)) +
  geom_smooth(aes(prevWAR,salary/1000000)) +
  scale_y_continuous(breaks=seq(-30,30,5),trans="log10") +
  labs(x="WAR in previous year", y="Salary change ($, millions)")
## `geom_smooth()` using method = 'loess'
## Warning in self$trans$transform(breaks): NaNs produced

w2 <- warstats %>% filter(yearID==1991)
#get est and r^2 by year?
summary(lm(w2$dsal~w2$prevWAA_off))
## 
## Call:
## lm(formula = w2$dsal ~ w2$prevWAA_off)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1242335  -267712  -116604   142628  2142901 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      324225      23463  13.819   <2e-16 ***
## w2$prevWAA_off   165917      18256   9.088   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 435800 on 349 degrees of freedom
## Multiple R-squared:  0.1914, Adjusted R-squared:  0.1891 
## F-statistic: 82.59 on 1 and 349 DF,  p-value: < 2.2e-16
summary(lm(w2$dsal~w2$prevWAR_off))
## 
## Call:
## lm(formula = w2$dsal ~ w2$prevWAR_off)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1342358  -184112   -66506   103337  1976559 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      133148      27485   4.844 1.91e-06 ***
## w2$prevWAR_off   164169      12931  12.696  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 400800 on 349 degrees of freedom
## Multiple R-squared:  0.3159, Adjusted R-squared:  0.314 
## F-statistic: 161.2 on 1 and 349 DF,  p-value: < 2.2e-16
summary(lm(w2$dsal~w2$prevWAR))
## 
## Call:
## lm(formula = w2$dsal ~ w2$prevWAR)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1318664  -198423   -80652    88553  2004829 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   161395      26878   6.005 4.81e-09 ***
## w2$prevWAR    135519      11234  12.063  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 407100 on 349 degrees of freedom
## Multiple R-squared:  0.2943, Adjusted R-squared:  0.2922 
## F-statistic: 145.5 on 1 and 349 DF,  p-value: < 2.2e-16
summary(lm(w2$dsal ~ w2$prevWAR + w2$dWAR) )
## 
## Call:
## lm(formula = w2$dsal ~ w2$prevWAR + w2$dWAR)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1273967  -183512   -75252   103165  1839214 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   145008      27543   5.265 2.46e-07 ***
## w2$prevWAR    149332      12539  11.910  < 2e-16 ***
## w2$dWAR        31980      13246   2.414   0.0163 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 404300 on 348 degrees of freedom
## Multiple R-squared:  0.3059, Adjusted R-squared:  0.3019 
## F-statistic: 76.68 on 2 and 348 DF,  p-value: < 2.2e-16

Sure enough, doesn’t look like much. But what if it changes over time?

addcba <- function(df){
  df$cba = .bincode(df$yearID,c(0,1993.5,1994.5,2002,2006,2010,10000))
  df$cba = ifelse(df$cba>2,df$cba+5,df$cba)
  df$cba = ifelse(df$cba==1,df$cba+6,df$cba)
  df$cba = ifelse(df$cba==2,NA,df$cba)
  df
}

regres <- addcba(regres)

anova.reg <- aov(adj.r.squared ~ as.factor(cba),data=regres)
summary(anova.reg)
##                Df  Sum Sq Mean Sq F value   Pr(>F)    
## as.factor(cba)  4 0.22894 0.05724    36.4 4.39e-10 ***
## Residuals      25 0.03932 0.00157                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 1 observation deleted due to missingness
TukeyHSD(anova.reg)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = adj.r.squared ~ as.factor(cba), data = regres)
## 
## $`as.factor(cba)`
##              diff          lwr          upr     p adj
## 8-7    0.06186162  0.003628854  0.120094394 0.0333907
## 9-7   -0.01380545 -0.085125738  0.057514835 0.9784690
## 10-7  -0.13789710 -0.209217387 -0.066576814 0.0000596
## 11-7  -0.16410402 -0.227002599 -0.101205438 0.0000005
## 9-8   -0.07566708 -0.146987362 -0.004346789 0.0336931
## 10-8  -0.19975872 -0.271079011 -0.128438438 0.0000001
## 11-8  -0.22596564 -0.288864223 -0.163067062 0.0000000
## 10-9  -0.12409165 -0.206445222 -0.041738075 0.0014244
## 11-9  -0.15029857 -0.225476750 -0.075120384 0.0000367
## 11-10 -0.02620692 -0.101385102  0.048971265 0.8420085

regres <- addcba(regres)

anova.reg <- aov(adj.r.squared ~ as.factor(cba),data=regres)
summary(anova.reg)
##                Df  Sum Sq Mean Sq F value   Pr(>F)    
## as.factor(cba)  4 0.14687 0.03672   33.12 1.18e-09 ***
## Residuals      25 0.02772 0.00111                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 1 observation deleted due to missingness
TukeyHSD(anova.reg)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = adj.r.squared ~ as.factor(cba), data = regres)
## 
## $`as.factor(cba)`
##                diff         lwr         upr     p adj
## 8-7    0.1071798491  0.05828624  0.15607346 0.0000090
## 9-7    0.0114611309 -0.04842107  0.07134333 0.9793432
## 10-7  -0.0748273792 -0.13470958 -0.01494518 0.0092462
## 11-7  -0.0740462640 -0.12685740 -0.02123513 0.0030796
## 9-8   -0.0957187182 -0.15560092 -0.03583652 0.0007211
## 10-8  -0.1820072283 -0.24188943 -0.12212503 0.0000000
## 11-8  -0.1812261131 -0.23403725 -0.12841497 0.0000000
## 10-9  -0.0862885101 -0.15543452 -0.01714250 0.0093555
## 11-9  -0.0855073949 -0.14862878 -0.02238601 0.0043512
## 11-10  0.0007811152 -0.06234027  0.06390250 0.9999996
regres %>% ggplot() + geom_bar(aes(cba,adj.r.squared),stat="identity")# + geom_errorbar()
## Warning: Removed 1 rows containing missing values (position_stack).

Looks like this may have changed along with the collective bargaining agreements. May have had to do with luxury tax. Or maybe steroid era.

salteam2 <- left_join(salteam,names,by=c("teamID","yearID"))
salteam2 <- salteam2 %>% mutate(wlratio = W/L, wpct = W/(W+L))
salteam3 <- salteam2 %>% group_by(yearID) %>% 
  mutate(muavgy = mean(avgsal),sdavgy = sd(avgsal),
         mutoty = mean(totsal),sdtoty = sd(totsal),
         wpd = totsal/W)
cpi2 <- cpi %>% rename(yearID = year)
salteam4 <- merge(salteam3,cpi2,by="yearID")
salteam4 <- salteam4 %>% mutate(wpd1990 = wpd/v1900)

wlres <- salteam2 %>% group_by(yearID) %>% do(tidy(lm(wlratio ~ totsal,data=.)))
wlres <- salteam2 %>% group_by(yearID) %>% do(glance(lm(wlratio ~ p50sal,data=.)))
wlres
## # A tibble: 32 x 12
## # Groups:   yearID [32]
##    yearID   r.squared adj.r.squared     sigma statistic     p.value    df
##     <int>       <dbl>         <dbl>     <dbl>     <dbl>       <dbl> <int>
##  1   1985 0.010728380   -0.03049127 0.3244773 0.2602734 0.614592541     2
##  2   1986 0.013038897   -0.02808448 0.2977107 0.3170677 0.578598308     2
##  3   1987 0.006414849   -0.03498453 0.2541052 0.1549504 0.697324192     2
##  4   1988 0.099804182    0.06229602 0.2930442 2.6608659 0.115899337     2
##  5   1989 0.241350125    0.20973971 0.2165099 7.6351466 0.010813630     2
##  6   1990 0.010524416   -0.03070373 0.2503326 0.2552726 0.617995237     2
##  7   1991 0.061271579    0.02215790 0.2352984 1.5664998 0.222778135     2
##  8   1992 0.007498910   -0.03385530 0.2694470 0.1813336 0.674024235     2
##  9   1993 0.235007108    0.20558430 0.2875715 7.9872439 0.008936571     2
## 10   1994 0.143842252    0.11091311 0.2954301 4.3682354 0.046540300     2
## # ... with 22 more rows, and 5 more variables: logLik <dbl>, AIC <dbl>,
## #   BIC <dbl>, deviance <dbl>, df.residual <int>
#yearly standard
date1 = seq(1985,2010,5)
date2 = seq(1989,2014,5)
dates = data.frame(date1,date2)

for(i in 1:6){
  tmpsal <- salteam3 %>% filter(yearID > dates$date1[[i]] & yearID <= dates$date2[[i]])
  tmpsal <- tmpsal %>% mutate(stdsal = (totsal - mutoty)/sdtoty)
  wlres <- tmpsal %>% lm(wpct ~ stdsal,data=.)
  print(summary(wlres))
  myplot <- tmpsal %>% ggplot() + geom_point(aes(stdsal,wpct)) + geom_smooth(aes(stdsal,wpct))
  print(myplot)
}
## 
## Call:
## lm(formula = wpct ~ stdsal, data = .)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.170968 -0.041127  0.003113  0.043322  0.154448 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.499950   0.006221  80.363   <2e-16 ***
## stdsal      0.011004   0.006344   1.734   0.0859 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06344 on 102 degrees of freedom
## Multiple R-squared:  0.02865,    Adjusted R-squared:  0.01913 
## F-statistic: 3.008 on 1 and 102 DF,  p-value: 0.08585
## `geom_smooth()` using method = 'loess'

## 
## Call:
## lm(formula = wpct ~ stdsal, data = .)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.14919 -0.05129 -0.01026  0.05134  0.17869 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.500060   0.006151  81.299  < 2e-16 ***
## stdsal      0.017996   0.006268   2.871  0.00494 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06392 on 106 degrees of freedom
## Multiple R-squared:  0.07215,    Adjusted R-squared:  0.0634 
## F-statistic: 8.243 on 1 and 106 DF,  p-value: 0.00494
## `geom_smooth()` using method = 'loess'

## 
## Call:
## lm(formula = wpct ~ stdsal, data = .)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.16329 -0.04518  0.00182  0.03722  0.14106 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.499952   0.005431   92.05  < 2e-16 ***
## stdsal      0.039849   0.005527    7.21 6.57e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0585 on 114 degrees of freedom
## Multiple R-squared:  0.3132, Adjusted R-squared:  0.3071 
## F-statistic: 51.98 on 1 and 114 DF,  p-value: 6.57e-11
## `geom_smooth()` using method = 'loess'

## 
## Call:
## lm(formula = wpct ~ stdsal, data = .)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.20606 -0.05451  0.01146  0.05256  0.20215 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.500000   0.006899  72.477  < 2e-16 ***
## stdsal      0.036670   0.007017   5.226 7.57e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.07557 on 118 degrees of freedom
## Multiple R-squared:  0.188,  Adjusted R-squared:  0.1811 
## F-statistic: 27.31 on 1 and 118 DF,  p-value: 7.566e-07
## `geom_smooth()` using method = 'loess'

## 
## Call:
## lm(formula = wpct ~ stdsal, data = .)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.145329 -0.041975  0.002131  0.041144  0.134287 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.499977   0.005218  95.812  < 2e-16 ***
## stdsal      0.029380   0.005308   5.536  1.9e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.05716 on 118 degrees of freedom
## Multiple R-squared:  0.2061, Adjusted R-squared:  0.1994 
## F-statistic: 30.64 on 1 and 118 DF,  p-value: 1.904e-07
## `geom_smooth()` using method = 'loess'

## 
## Call:
## lm(formula = wpct ~ stdsal, data = .)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.149096 -0.038805  0.007312  0.052624  0.114660 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.499994   0.006038  82.801  < 2e-16 ***
## stdsal      0.021162   0.006142   3.446  0.00079 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06615 on 118 degrees of freedom
## Multiple R-squared:  0.09141,    Adjusted R-squared:  0.08371 
## F-statistic: 11.87 on 1 and 118 DF,  p-value: 0.0007899
## `geom_smooth()` using method = 'loess'

#5 year standard
date1 = seq(1985,2010,5)
date2 = seq(1989,2014,5)
dates = data.frame(date1,date2)

for(i in 1:6){
  tmpsal <- salteam2 %>% filter(yearID > dates$date1[[i]] & yearID <= dates$date2[[i]])
  meansdsal <- tmpsal %>% summarize(meansal = mean(totsal), sdsal = sd(totsal))
  tmpsal <- tmpsal %>% mutate(stdsal = (totsal - meansdsal$meansal)/meansdsal$sdsal)
  wlres <- tmpsal %>% lm(wpct ~ stdsal,data=.)
  print(summary(wlres))
  myplot <- tmpsal %>% ggplot() + geom_point(aes(stdsal,wpct)) + geom_smooth(aes(stdsal,wpct))
  print(myplot)
}
## 
## Call:
## lm(formula = wpct ~ stdsal, data = .)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.172902 -0.040316  0.003635  0.042368  0.154907 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.499829   0.006214  80.440   <2e-16 ***
## stdsal      0.010701   0.005914   1.809   0.0734 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06336 on 102 degrees of freedom
## Multiple R-squared:  0.03109,    Adjusted R-squared:  0.0216 
## F-statistic: 3.273 on 1 and 102 DF,  p-value: 0.07335
## `geom_smooth()` using method = 'loess'

## 
## Call:
## lm(formula = wpct ~ stdsal, data = .)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.148902 -0.047993  0.000505  0.046672  0.163294 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.498574   0.005951  83.773  < 2e-16 ***
## stdsal      0.019611   0.004829   4.061 9.39e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06173 on 106 degrees of freedom
## Multiple R-squared:  0.1346, Adjusted R-squared:  0.1265 
## F-statistic: 16.49 on 1 and 106 DF,  p-value: 9.39e-05
## `geom_smooth()` using method = 'loess'

## 
## Call:
## lm(formula = wpct ~ stdsal, data = .)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.15143 -0.04235  0.00053  0.04179  0.16215 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.496562   0.005639  88.054  < 2e-16 ***
## stdsal      0.028600   0.004450   6.427 3.13e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06047 on 114 degrees of freedom
## Multiple R-squared:  0.266,  Adjusted R-squared:  0.2595 
## F-statistic: 41.31 on 1 and 114 DF,  p-value: 3.129e-09
## `geom_smooth()` using method = 'loess'

## 
## Call:
## lm(formula = wpct ~ stdsal, data = .)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.20521 -0.05467  0.01017  0.05316  0.20306 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.499105   0.006890   72.44  < 2e-16 ***
## stdsal      0.036616   0.006948    5.27 6.24e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.07545 on 118 degrees of freedom
## Multiple R-squared:  0.1905, Adjusted R-squared:  0.1837 
## F-statistic: 27.77 on 1 and 118 DF,  p-value: 6.242e-07
## `geom_smooth()` using method = 'loess'

## 
## Call:
## lm(formula = wpct ~ stdsal, data = .)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.146966 -0.040119  0.005833  0.039018  0.133624 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.499374   0.005238  95.334  < 2e-16 ***
## stdsal      0.028332   0.005209   5.439 2.94e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.05737 on 118 degrees of freedom
## Multiple R-squared:  0.2005, Adjusted R-squared:  0.1937 
## F-statistic: 29.59 on 1 and 118 DF,  p-value: 2.938e-07
## `geom_smooth()` using method = 'loess'

## 
## Call:
## lm(formula = wpct ~ stdsal, data = .)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.153625 -0.041106  0.005691  0.053141  0.114145 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.499030   0.006087  81.988  < 2e-16 ***
## stdsal      0.017940   0.005633   3.185  0.00185 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06659 on 118 degrees of freedom
## Multiple R-squared:  0.07916,    Adjusted R-squared:  0.07136 
## F-statistic: 10.14 on 1 and 118 DF,  p-value: 0.001852
## `geom_smooth()` using method = 'loess'

#yearly standard
date1 = seq(1985,2010,5)
date2 = seq(1989,2014,5)
dates = data.frame(date1,date2)

for(i in 1:6){
  tmpsal <- salteam3 %>% filter(yearID > dates$date1[[i]] & yearID <= dates$date2[[i]])
  tmpsal <- tmpsal %>% mutate(stdsal = (totsal - mutoty)/sdtoty)
  wlres <- tmpsal %>% lm(Rank ~ stdsal,data=.)
  print(summary(wlres))
  myplot <- tmpsal %>% ggplot() + geom_point(aes(stdsal,Rank)) + geom_smooth(aes(stdsal,Rank))
  print(myplot)
}
## 
## Call:
## lm(formula = Rank ~ stdsal, data = .)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.0924 -1.6187  0.0335  1.6093  3.4680 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   3.7308     0.1824  20.456   <2e-16 ***
## stdsal       -0.3406     0.1860  -1.831     0.07 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.86 on 102 degrees of freedom
## Multiple R-squared:  0.03183,    Adjusted R-squared:  0.02234 
## F-statistic: 3.354 on 1 and 102 DF,  p-value: 0.06997
## `geom_smooth()` using method = 'loess'

## 
## Call:
## lm(formula = Rank ~ stdsal, data = .)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.1463 -1.4999 -0.3132  1.5823  3.9737 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   3.5370     0.1745  20.269   <2e-16 ***
## stdsal       -0.3701     0.1778  -2.081   0.0398 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.813 on 106 degrees of freedom
## Multiple R-squared:  0.03926,    Adjusted R-squared:  0.0302 
## F-statistic: 4.332 on 1 and 106 DF,  p-value: 0.03981
## `geom_smooth()` using method = 'loess'

## 
## Call:
## lm(formula = Rank ~ stdsal, data = .)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.4587 -1.0231 -0.0375  1.0505  3.4596 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.9483     0.1178  25.024  < 2e-16 ***
## stdsal       -0.6690     0.1199  -5.579 1.65e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.269 on 114 degrees of freedom
## Multiple R-squared:  0.2145, Adjusted R-squared:  0.2076 
## F-statistic: 31.13 on 1 and 114 DF,  p-value: 1.653e-07
## `geom_smooth()` using method = 'loess'

## 
## Call:
## lm(formula = Rank ~ stdsal, data = .)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.7496 -1.1089  0.1419  1.0274  3.0226 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   3.0333     0.1223  24.804  < 2e-16 ***
## stdsal       -0.6439     0.1244  -5.177 9.38e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.34 on 118 degrees of freedom
## Multiple R-squared:  0.1851, Adjusted R-squared:  0.1782 
## F-statistic:  26.8 on 1 and 118 DF,  p-value: 9.376e-07
## `geom_smooth()` using method = 'loess'

## 
## Call:
## lm(formula = Rank ~ stdsal, data = .)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.7036 -1.1284 -0.0261  1.1088  3.2836 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   3.0167     0.1236  24.398  < 2e-16 ***
## stdsal       -0.5685     0.1258  -4.521 1.47e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.354 on 118 degrees of freedom
## Multiple R-squared:  0.1476, Adjusted R-squared:  0.1404 
## F-statistic: 20.44 on 1 and 118 DF,  p-value: 1.474e-05
## `geom_smooth()` using method = 'loess'

## 
## Call:
## lm(formula = Rank ~ stdsal, data = .)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.4496 -1.1248 -0.1485  0.9314  2.7751 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   3.0250     0.1289  23.474  < 2e-16 ***
## stdsal       -0.3688     0.1311  -2.814  0.00574 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.412 on 118 degrees of freedom
## Multiple R-squared:  0.06288,    Adjusted R-squared:  0.05493 
## F-statistic: 7.917 on 1 and 118 DF,  p-value: 0.005739
## `geom_smooth()` using method = 'loess'

date1 = seq(1985,2010,5)
date2 = seq(1989,2014,5)
dates = data.frame(date1,date2)

for(i in 1:6){
  tmpsal <- salteam2 %>% filter(yearID > dates$date1[[i]] & yearID <= dates$date2[[i]])
  meansdsal <- tmpsal %>% summarize(meansal = mean(totsal), sdsal = sd(totsal))
  tmpsal <- tmpsal %>% mutate(stdsal = (totsal - meansdsal$meansal)/meansdsal$sdsal)
  wlres <- tmpsal %>% lm(Rank ~ stdsal,data=.)
  print(summary(wlres))
  myplot <- tmpsal %>% ggplot() + geom_point(aes(stdsal,Rank)) + geom_smooth(aes(stdsal,Rank))
  print(myplot)
}
## 
## Call:
## lm(formula = Rank ~ stdsal, data = .)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.2970 -1.6387 -0.0339  1.6072  3.6072 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   3.7345     0.1822  20.500   <2e-16 ***
## stdsal       -0.3294     0.1734  -1.899   0.0603 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.858 on 102 degrees of freedom
## Multiple R-squared:  0.03416,    Adjusted R-squared:  0.02469 
## F-statistic: 3.608 on 1 and 102 DF,  p-value: 0.06033
## `geom_smooth()` using method = 'loess'

## 
## Call:
## lm(formula = Rank ~ stdsal, data = .)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.1544 -1.4275 -0.1564  1.2422  4.0194 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   3.5734     0.1689  21.161  < 2e-16 ***
## stdsal       -0.4798     0.1370  -3.502 0.000678 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.752 on 106 degrees of freedom
## Multiple R-squared:  0.1037, Adjusted R-squared:  0.09523 
## F-statistic: 12.26 on 1 and 106 DF,  p-value: 0.0006778
## `geom_smooth()` using method = 'loess'

## 
## Call:
## lm(formula = Rank ~ stdsal, data = .)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.4576 -0.9606 -0.0107  0.9254  3.6188 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.00601    0.12036  24.976  < 2e-16 ***
## stdsal      -0.48703    0.09497  -5.128 1.21e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.291 on 114 degrees of freedom
## Multiple R-squared:  0.1874, Adjusted R-squared:  0.1803 
## F-statistic:  26.3 on 1 and 114 DF,  p-value: 1.21e-06
## `geom_smooth()` using method = 'loess'

## 
## Call:
## lm(formula = Rank ~ stdsal, data = .)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.7160 -1.1112  0.1837  0.9506  2.8982 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   3.0492     0.1218  25.027  < 2e-16 ***
## stdsal       -0.6499     0.1229  -5.289 5.73e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.334 on 118 degrees of freedom
## Multiple R-squared:  0.1917, Adjusted R-squared:  0.1848 
## F-statistic: 27.98 on 1 and 118 DF,  p-value: 5.729e-07
## `geom_smooth()` using method = 'loess'

## 
## Call:
## lm(formula = Rank ~ stdsal, data = .)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.68509 -1.13595  0.01056  1.02661  3.16204 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   3.0282     0.1241  24.394  < 2e-16 ***
## stdsal       -0.5436     0.1234  -4.404 2.35e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.36 on 118 degrees of freedom
## Multiple R-squared:  0.1412, Adjusted R-squared:  0.1339 
## F-statistic: 19.39 on 1 and 118 DF,  p-value: 2.351e-05
## `geom_smooth()` using method = 'loess'

## 
## Call:
## lm(formula = Rank ~ stdsal, data = .)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.4240 -1.1755 -0.1448  0.9357  2.6941 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   3.0421     0.1295  23.496  < 2e-16 ***
## stdsal       -0.3180     0.1198  -2.654  0.00905 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.417 on 118 degrees of freedom
## Multiple R-squared:  0.05633,    Adjusted R-squared:  0.04834 
## F-statistic: 7.044 on 1 and 118 DF,  p-value: 0.009048
## `geom_smooth()` using method = 'loess'

#yearly standard
date1 = seq(1985,2010,5)
date2 = seq(1989,2014,5)
dates = data.frame(date1,date2)

for(i in 1:6){
  tmpsal <- salteam3 %>% filter(yearID > dates$date1[[i]] & yearID <= dates$date2[[i]])
  tmpsal <- tmpsal %>% mutate(stdsal = (totsal - mutoty)/sdtoty)
  wlres <- tmpsal %>% lm(wlratio ~ stdsal,data=.)
  print(summary(wlres))
  myplot <- tmpsal %>% ggplot() + geom_point(aes(stdsal,wlratio)) + geom_smooth(aes(stdsal,wlratio))
  print(myplot)
}
## 
## Call:
## lm(formula = wlratio ~ stdsal, data = .)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.55857 -0.17972 -0.01983  0.15692  0.90961 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.03342    0.02611  39.586   <2e-16 ***
## stdsal       0.05109    0.02662   1.919   0.0578 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2662 on 102 degrees of freedom
## Multiple R-squared:  0.03485,    Adjusted R-squared:  0.02539 
## F-statistic: 3.683 on 1 and 102 DF,  p-value: 0.05777
## `geom_smooth()` using method = 'loess'

## 
## Call:
## lm(formula = wlratio ~ stdsal, data = .)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.51801 -0.21742 -0.07618  0.19493  0.93268 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.03698    0.02658  39.011  < 2e-16 ***
## stdsal       0.07270    0.02709   2.684  0.00845 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2762 on 106 degrees of freedom
## Multiple R-squared:  0.06362,    Adjusted R-squared:  0.05479 
## F-statistic: 7.202 on 1 and 106 DF,  p-value: 0.00845
## `geom_smooth()` using method = 'loess'

## 
## Call:
## lm(formula = wlratio ~ stdsal, data = .)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.52790 -0.17900 -0.02483  0.14699  1.05555 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.04262    0.02438  42.758  < 2e-16 ***
## stdsal       0.17596    0.02482   7.091 1.19e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2626 on 114 degrees of freedom
## Multiple R-squared:  0.3061, Adjusted R-squared:    0.3 
## F-statistic: 50.28 on 1 and 114 DF,  p-value: 1.193e-10
## `geom_smooth()` using method = 'loess'

## 
## Call:
## lm(formula = wlratio ~ stdsal, data = .)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.62279 -0.21269 -0.00028  0.18769  1.40373 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.05802    0.02965  35.684  < 2e-16 ***
## stdsal       0.15825    0.03016   5.248 6.88e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3248 on 118 degrees of freedom
## Multiple R-squared:  0.1892, Adjusted R-squared:  0.1823 
## F-statistic: 27.54 on 1 and 118 DF,  p-value: 6.883e-07
## `geom_smooth()` using method = 'loess'

## 
## Call:
## lm(formula = wlratio ~ stdsal, data = .)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.52235 -0.15494 -0.01333  0.16523  0.61128 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.03274    0.02112  48.897  < 2e-16 ***
## stdsal       0.12556    0.02148   5.845  4.6e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2314 on 118 degrees of freedom
## Multiple R-squared:  0.2245, Adjusted R-squared:  0.2179 
## F-statistic: 34.16 on 1 and 118 DF,  p-value: 4.596e-08
## `geom_smooth()` using method = 'loess'

## 
## Call:
## lm(formula = wlratio ~ stdsal, data = .)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.4709 -0.1866 -0.0040  0.2036  0.5325 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.03796    0.02447  42.412  < 2e-16 ***
## stdsal       0.08536    0.02489   3.429 0.000834 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2681 on 118 degrees of freedom
## Multiple R-squared:  0.09063,    Adjusted R-squared:  0.08292 
## F-statistic: 11.76 on 1 and 118 DF,  p-value: 0.0008343
## `geom_smooth()` using method = 'loess'

date1 = seq(1985,2010,5)
date2 = seq(1989,2014,5)
dates = data.frame(date1,date2)

for(i in 1:6){
  tmpsal <- salteam2 %>% filter(yearID > dates$date1[[i]] & yearID <= dates$date2[[i]])
  meansdsal <- tmpsal %>% summarize(meansal = mean(totsal), sdsal = sd(totsal))
  tmpsal <- tmpsal %>% mutate(stdsal = (totsal - meansdsal$meansal)/meansdsal$sdsal)
  wlres <- tmpsal %>% lm(wlratio ~ stdsal,data=.)
  print(summary(wlres))
  myplot <- tmpsal %>% ggplot() + geom_point(aes(stdsal,wlratio)) + geom_smooth(aes(stdsal,wlratio))
  print(myplot)
}
## 
## Call:
## lm(formula = wlratio ~ stdsal, data = .)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.56787 -0.17878 -0.00164  0.15248  0.91129 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.03286    0.02606   39.64   <2e-16 ***
## stdsal       0.05010    0.02480    2.02    0.046 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2657 on 102 degrees of freedom
## Multiple R-squared:  0.03845,    Adjusted R-squared:  0.02903 
## F-statistic: 4.079 on 1 and 102 DF,  p-value: 0.04604
## `geom_smooth()` using method = 'loess'

## 
## Call:
## lm(formula = wlratio ~ stdsal, data = .)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.5206 -0.2016 -0.0310  0.1691  0.8746 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.03055    0.02558  40.287  < 2e-16 ***
## stdsal       0.08483    0.02076   4.087 8.53e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2653 on 106 degrees of freedom
## Multiple R-squared:  0.1361, Adjusted R-squared:  0.128 
## F-statistic:  16.7 on 1 and 106 DF,  p-value: 8.53e-05
## `geom_smooth()` using method = 'loess'

## 
## Call:
## lm(formula = wlratio ~ stdsal, data = .)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.51892 -0.19926 -0.03073  0.14437  1.14747 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.02755    0.02523  40.721  < 2e-16 ***
## stdsal       0.12711    0.01991   6.384 3.86e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2706 on 114 degrees of freedom
## Multiple R-squared:  0.2634, Adjusted R-squared:  0.2569 
## F-statistic: 40.76 on 1 and 114 DF,  p-value: 3.856e-09
## `geom_smooth()` using method = 'loess'

## 
## Call:
## lm(formula = wlratio ~ stdsal, data = .)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.60950 -0.23063 -0.00783  0.17570  1.40764 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.05416    0.02961  35.606  < 2e-16 ***
## stdsal       0.15815    0.02986   5.297 5.54e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3242 on 118 degrees of freedom
## Multiple R-squared:  0.1921, Adjusted R-squared:  0.1853 
## F-statistic: 28.06 on 1 and 118 DF,  p-value: 5.536e-07
## `geom_smooth()` using method = 'loess'

## 
## Call:
## lm(formula = wlratio ~ stdsal, data = .)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.5298 -0.1670 -0.0036  0.1459  0.6091 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.03015    0.02118  48.628  < 2e-16 ***
## stdsal       0.12160    0.02107   5.773 6.43e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.232 on 118 degrees of freedom
## Multiple R-squared:  0.2202, Adjusted R-squared:  0.2136 
## F-statistic: 33.32 on 1 and 118 DF,  p-value: 6.434e-08
## `geom_smooth()` using method = 'loess'

## 
## Call:
## lm(formula = wlratio ~ stdsal, data = .)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.46046 -0.18495 -0.01397  0.20660  0.53015 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.03410    0.02468  41.899  < 2e-16 ***
## stdsal       0.07188    0.02284   3.147  0.00209 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.27 on 118 degrees of freedom
## Multiple R-squared:  0.07744,    Adjusted R-squared:  0.06962 
## F-statistic: 9.904 on 1 and 118 DF,  p-value: 0.002088
## `geom_smooth()` using method = 'loess'

tmpsal <- salteam3 %>% mutate(stdsal = (totsal - mutoty)/sdtoty)
reg_totsal <- tmpsal %>% group_by(yearID) %>% do(glance(lm(wpct ~ stdsal,data=.)))
reg_totsal
## # A tibble: 32 x 12
## # Groups:   yearID [32]
##    yearID    r.squared adj.r.squared      sigma    statistic    p.value
##     <int>        <dbl>         <dbl>      <dbl>        <dbl>      <dbl>
##  1   1985 1.179438e-01   0.081191444 0.07456162 3.209150e+00 0.08584779
##  2   1986 4.768197e-02   0.008002049 0.06298925 1.201665e+00 0.28386748
##  3   1987 2.671143e-03  -0.038884226 0.06175368 6.427913e-02 0.80201543
##  4   1988 2.656597e-02  -0.013993778 0.07475266 6.549836e-01 0.42628770
##  5   1989 1.210865e-01   0.084465061 0.05840066 3.306440e+00 0.08151193
##  6   1990 1.013029e-06  -0.041665611 0.05701009 2.431271e-05 0.99610657
##  7   1991 5.445002e-02   0.015052109 0.05925664 1.382053e+00 0.25128263
##  8   1992 1.068086e-03  -0.040554077 0.06433800 2.566146e-02 0.87407052
##  9   1993 1.243558e-01   0.090677203 0.07153848 3.692426e+00 0.06568576
## 10   1994 1.682955e-01   0.136306831 0.06355227 5.261102e+00 0.03013797
## # ... with 22 more rows, and 6 more variables: df <int>, logLik <dbl>,
## #   AIC <dbl>, BIC <dbl>, deviance <dbl>, df.residual <int>
tmpsal <- salteam3 %>% mutate(stdsal = (avgsal - muavgy)/sdavgy)
reg_avgsal <- tmpsal %>% group_by(yearID) %>% do(glance(lm(wpct ~ stdsal,data=.)))
reg_avgsal
## # A tibble: 32 x 12
## # Groups:   yearID [32]
##    yearID   r.squared adj.r.squared      sigma  statistic    p.value    df
##     <int>       <dbl>         <dbl>      <dbl>      <dbl>      <dbl> <int>
##  1   1985 0.043305297   0.003443018 0.07765222 1.08637282 0.30766239     2
##  2   1986 0.076376525   0.037892214 0.06203302 1.98461457 0.17172974     2
##  3   1987 0.001131412  -0.040488112 0.06180133 0.02718465 0.87042113     2
##  4   1988 0.074519865   0.035958193 0.07288815 1.93248532 0.17724633     2
##  5   1989 0.178082790   0.143836240 0.05647533 5.20002127 0.03176086     2
##  6   1990 0.000655151  -0.040984218 0.05699145 0.01573393 0.90122402     2
##  7   1991 0.074751769   0.036199759 0.05861704 1.93898501 0.17654687     2
##  8   1992 0.002920803  -0.038624163 0.06427831 0.07030462 0.79315789     2
##  9   1993 0.121958251   0.088187415 0.07163635 3.61134824 0.06852931     2
## 10   1994 0.209359433   0.178950180 0.06196352 6.88472800 0.01435708     2
## # ... with 22 more rows, and 5 more variables: logLik <dbl>, AIC <dbl>,
## #   BIC <dbl>, deviance <dbl>, df.residual <int>
tmpsal <- salteam3 %>% mutate(stdsal = (avgsal - muavgy)/sdavgy)
reg_avgsal2 <- tmpsal %>% group_by(yearID) %>% do(tidy(lm(wpct ~ stdsal,data=.)))
reg_avgsal2 <- reg_avgsal2 %>% filter(term=="stdsal")
reg_avgsal2
## # A tibble: 32 x 6
## # Groups:   yearID [32]
##    yearID   term     estimate  std.error  statistic    p.value
##     <int>  <chr>        <dbl>      <dbl>      <dbl>      <dbl>
##  1   1985 stdsal  0.016187259 0.01553044  1.0422921 0.30766239
##  2   1986 stdsal  0.017477970 0.01240660  1.4087635 0.17172974
##  3   1987 stdsal  0.002037932 0.01236027  0.1648777 0.87042113
##  4   1988 stdsal  0.020264927 0.01457763  1.3901386 0.17724633
##  5   1989 stdsal  0.025756764 0.01129507  2.2803555 0.03176086
##  6   1990 stdsal  0.001429744 0.01139829  0.1254350 0.90122402
##  7   1991 stdsal  0.016324546 0.01172341  1.3924744 0.17654687
##  8   1992 stdsal -0.003408681 0.01285566 -0.2651502 0.79315789
##  9   1993 stdsal  0.026199094 0.01378642  1.9003548 0.06852931
## 10   1994 stdsal  0.031289427 0.01192489  2.6238765 0.01435708
## # ... with 22 more rows
tmpsal <- salteam3 %>% mutate(stdsal = (avgsal - muavgy)/sdavgy)
reg_avgsal3a <- tmpsal %>% group_by(yearID) %>% do(tidy(lm(Rank ~ stdsal,data=.)))
reg_avgsal3a <- reg_avgsal3a %>% filter(term=="stdsal")
reg_avgsal3a
## # A tibble: 32 x 6
## # Groups:   yearID [32]
##    yearID   term   estimate std.error  statistic    p.value
##     <int>  <chr>      <dbl>     <dbl>      <dbl>      <dbl>
##  1   1985 stdsal -0.4248347 0.3808463 -1.1155016 0.27568085
##  2   1986 stdsal -0.4452098 0.3833190 -1.1614601 0.25687641
##  3   1987 stdsal -0.1981250 0.3754045 -0.5277639 0.60250898
##  4   1988 stdsal -0.4917181 0.3809434 -1.2907906 0.20907270
##  5   1989 stdsal -0.8780452 0.3496219 -2.5114134 0.01916067
##  6   1990 stdsal -0.1544620 0.3850151 -0.4011842 0.69183414
##  7   1991 stdsal -0.1040378 0.3975860 -0.2616738 0.79580564
##  8   1992 stdsal  0.1019285 0.3813981  0.2672496 0.79156015
##  9   1993 stdsal -0.7841809 0.3644724 -2.1515507 0.04089753
## 10   1994 stdsal -0.4614412 0.2513726 -1.8356863 0.07786736
## # ... with 22 more rows
tmpsal <- salteam3 %>% mutate(stdsal = (avgsal - muavgy)/sdavgy)
reg_avgsal3b <- tmpsal %>% group_by(yearID) %>% do(glance(lm(Rank ~ stdsal,data=.)))
reg_avgsal3b
## # A tibble: 32 x 12
## # Groups:   yearID [32]
##    yearID   r.squared adj.r.squared    sigma  statistic    p.value    df
##     <int>       <dbl>         <dbl>    <dbl>      <dbl>      <dbl> <int>
##  1   1985 0.049291985   0.009679151 1.904232 1.24434380 0.27568085     2
##  2   1986 0.053216699   0.013767395 1.916595 1.34898955 0.25687641     2
##  3   1987 0.011472470  -0.029716177 1.877023 0.27853476 0.60250898     2
##  4   1988 0.064915887   0.025954049 1.904717 1.66614027 0.20907270     2
##  5   1989 0.208108890   0.175113427 1.748110 6.30719715 0.01916067     2
##  6   1990 0.006661526  -0.034727577 1.925076 0.16094879 0.69183414     2
##  7   1991 0.002844932  -0.038703196 1.987930 0.06847316 0.79580564     2
##  8   1992 0.002967101  -0.038575936 1.906991 0.07142234 0.79156015     2
##  9   1993 0.151136009   0.118487394 1.893854 4.62917061 0.04089753     2
## 10   1994 0.114735226   0.080686581 1.306170 3.36974424 0.07786736     2
## # ... with 22 more rows, and 5 more variables: logLik <dbl>, AIC <dbl>,
## #   BIC <dbl>, deviance <dbl>, df.residual <int>

tot1

avg1

2

3a

3b

Predict awards

Conclusion